WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCX 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification ^ : 


A2 


(11) International Publication Number: 


WO 98/20165 


C12Q 1/68 


(43) International Publication Date: 


14 May 1998 (14.05.98) 



(21) International Application Number: PCT/US97/20313 

(22) International Filing Date: 5 November 1997 (05.1 1.97) 



(30) Priority Data: 

60/030»455 



6 November 1996 (06.1 1 .96) US 



(81) Designated States: JP. US. European patent (AT, BE, CH. DE. 
DK. ES, Fl. FR. GB. GR. IE, IT. LU. MC, NL. PT. SE). 



Published 

Without international search report and to be republished 
upon receipt of that report. 



(71) Applicant (for all designated States except US): WHITEHEAD 

INSTITUTE FOR BIOMEDICAL RESEARCH [USAJS]; 
Nine Cambridge Center. Cambridge. MA 02142 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): LANDER. Eric. S. 
[US/US]; 151 Bishop Allen Drive. Cambridge. MA 02138 
(US). WANG. David [CN/US]; Apartment 314, 276 Mass- 
achusetts Avenue, Ariington. MA 02173 (US). HUDSON, 
Thomas [CA/US]; 361 Metcalfe Avenue, Westmount, 
Quebec H3Z 2J2 (CA). 

(74) Agents: GRANAHAN. Patricia et al.; Hamilton. Brook. Smith 
& Reynolds. Two Militia Drive, Lexington. MA 02173 
(US). 



(54) Title: BIALLELIC MARKERS 



(57) Abstract 

The invention provides nucleic acid segments of the human genome including polymorphic sites. Allele-specific primers and probes 
hybridizing to regions flanking these sites are also provided. The nucleic acids, primers and probes are used in applications such as forensics, 
paternity testing, medicine and genetic analysis. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCX. 



AL 


Albania 


£S 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Stovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


SZ 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turicmenistan 


BP 


Burkina Faso 


GR 


Greece 




Republic of Macedcmia 


TR 


Tiirkey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Ttinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belanis 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CP 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Nelheriands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


CI 


Cflte d'lvoirc 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






CU 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






CZ 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federation 






DE 


Germany 


U 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singiqwre 







wo 98/20165 



PCT/US97/20313 



B I ALLELIC MARKERS 

RELATED APPLICATIONS 

This application claims priority to U.S. provisional 
application Serial No. 60/030,455, filed November 6, 1996, 
5 the entire teachings of which are incorporated herein by 
reference . 

BACKGROUND OF THE INVENTION 

The genomes of all organisms undergo spontaneous, 
mutation in the course of their continuing evolution, 

10 generating variant forms of progenitor sequences (Gusella, 
Ann. -Rev. Biochem, 55, 831-854 (1986)). The variant form 
may confer an evolutionary advantage or disadvantage 
relative to a progenitor form or may be neutral. In some 
instances, a variant form confers a lethal disadvantage and 

15 is not transmitted to subsequent generations of the 

organism. In other instances, a variant form confers an 
evolutionary advantage to the species and is eventually 
incorporated into the DNA of many or most members of the 
species and effectively becomes the progenitor form. In 

20 many instances, both progenitor and variant form(s) suTvive 
and co-exist in a species population. The coexistence of 
multiple forms of a sequence gives rise to polymorphisms. 

Several different types of polymorphism have been 
reported. A restriction fragment length polymorphism 

25 (RFLP) Is a variation in DNA sequence that alters the 

length of a restriction fragment (Botstein et al . , Am, J. 
Hum, Genet, 32, 314-331 (1980)). The restriction fragment 
length polymorphism may create or delete a restriction 
site, thus changing the length of the restriction fragment. 
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RFLPs have been widely used in human and animal genetic 
analyses (see WO 90/13668; W090/11369; Donis-Keller , Cell 
51, 319-337 (1987); Lander et al . , Genetics 121, 85-99 
(1989)) . When a heritable trait can be linked to a 
5 particular RFLP, the presence of the RFLP in an individual 
can be used to predict the likelihood that the animal will 
also exhibit the trait. 

Other polymorphisms take the form of short tandem 
repeats (STRs) that include tandem di-, tri- and tetra- 

10 nucleotide repeated motifs. These tandem repeats are also 
referred to as variable number tandem repeat (VNTR) 
polymorphisms. VNTRs have been used in identrty Hnd 
paternity analysis (US 5,075,217; Armour et al . , FEES Lett. 
307, 113-115 (1992); Horn et al . , WO 91/14003; Jeffreys, EP 

15 370,719), and in a large number of genetic mapping studies. 

Other polymorphisms take the form of single nucleotide 
variations between individuals of the same species. Such 
polymorphisms are far more frequent than RFLPs, STRs and 
VNTRs. Some single nucleotide polymorphisms occur in 

20 protein- coding sequences, in which case, one of the 

polymorphic forms may give rise to the expression of a 
defective or other variant protein and, potentially, a 
genetic disease. Examples of genes, in which polymorphisms 
within coding sequences give rise to genetic disease 

25 include )3-globin (sickle cell anemia) and CFTR (cystic 

fibrosis) , Other single nucleotide polymorphisms occur in 
noncoding regions. Some of these polymorphisms may also 
result in defective protein expression (e.g., as a result 
of defective splicing) . Other single nucleotide 

30 polymorphisms have no phenotypic effects. 

Single nucleotide polymorphisms can be used in the same 
manner as RFLPs and VNTRs, but offer several advantages. 
Single nucleotide polymorphisms occur with greater 
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frequency and are spaced more uniformly throughout the 
genome than other forms of polymorphism. The greater 
frequency and uniformity of single nucleotide polymorphisms 
means that there is a greater probability that such a 
5 polymorphism will be found in close proximity to a genetic 
locus of interest than would be the case for other 
polymorphisms. The different forms of characterized single 
nucleotide polymorphisms are often easier to distinguish 
than other types of polymorphism (e.g., by use of assays 
10 employing allele-specif ic hybridization probes or primers) . 
Only a small percentage of the total repository of 
polymorphisms in humans and other organisms -hars been 
identified. The limited number of polymorphisms identified 
to date is due to the large amount of work required for 
15 their detection by conventional methods. For example, a 

conventional approach to identifying polymorphisms might be 
to sequence the same stretch of DNA in a population of 
individuals by dideoxy sequencing. In this type of 
approach, the amount of work increases in proportion to 
20 both the length of sequence and the number of individuals 
in a population and becomes impractical for large stretches 
of DNA or large numbers of persons . 
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SUMMARY OF THE INVENTION 

The invention provides nucleic acid sequences 
comprising nucleic acid segments of from about 10 to about 
200 bases as shown in the Table, column 7, including a 
5 polymorphic site. Complements of these segments are also 
included. The segments can be'DNA or RNA, and can be 
double- or single- stranded. Segments can be, for example, 
10-20, 10-50 or 10-100 bases long. Preferred segments 
include a biallelic polymorphic site. The base occupying 

10 the polymorphic site in the segments can be the reference 
(Table, column 3) or an alternative base .(Table, column 4) 

The invention further provides allele-spedf ir- 
oligonucleotides that hybridize to a segment of a fragment 
shown in the Table, column 7, or its complement. These 

15 oligonucleotides can be probes or primers. Also provided 
are isolated nucleic acids comprising a sequence shown in 
the Table, column 7, or the complement thereto, in which 
the polymorphic site within the sequence is occupied by a 
base other than the reference base shown in the Table, 

20 column 3 . 

The invention further provides a method of analyzing a 
nucleic acid from an individual. The method determines 
which base is present at any one of the polymorphic sites 
shown in the Table. Optionally, a set of bases occupying 

25 set of the polymorphic sites shown in the Table is 

determined. This type of analysis can be performed on a 
number of individuals, who are tested for the presence of 
disease phenotype. The presence or absence of disease 
phenotype is then correlated" with a base or set of bases 

30 present at the polymorphic sites in the individuals tested 
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DETAILED DESCRIPTION OF THE INVENTION 
DEFINITIONS 

An oligonucleotide can be DNA or RNA, and single- or 
double- stranded. Oligonucleotides can be naturally 
5 occurring or synthetic, but are typically prepared by 
synthetic means. The oligonucleotides of the present 
invention can comprise all of an oligonucleotide sequence 
presented in column 7 of the Table or a segment of such an 
oligonucleotide which includes a polymorphic site. 

10 Oligonucleotides can be all of a nucleic acid segment as 
represented in column 7 of the Table; a nucleic acid 
sequence which comprises a nucleic acid segment represented 
in column 7 of the Table and additional nucleic acids 
(present at either or both ends of a nucleic acid segment 

15 of column 7) ; or a portion (fragment) of a nucleic acid 

segment represented in column 7 of the Table which includes 
a polymorphic site. Preferred oligonucleotides of the 
invention include segments of DNA, or their complements, 
which include any one of the polymorphic sites shown in the 

20 Table. The segments can be between 5 and 250 bases, and, 
in specific embodiments, are between 5-10, 5-20, 10-20, 10- 
50, 20-50 or 10-100 bases. The polymorphic site can occur 
within any position of the segment. The segments can be 
from any of the allelic forms of DNA shown in the Table. 

25 Hybridization probes are oligonucleotides which bind in 

a base-specific manner to a complementary strand of nucleic 
acid. Such probes include peptide nucleic acids, as 
described in Nielsen et al.. Science 254, 1497-1500 (1991). 
As used herein, the term primer refers to a single - 

3 0 stranded oligonucleotide which acts as a point of 

initiation of template-directed DNA synthesis under 
appropriate conditions (e.g., in the presence of four 
different nucleoside triphosphates and an agent for 
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polymerization, such as, DNA or RNA polymerase or reverse 
transcriptase) in an appropriate buffer and at a suitable 
temperature . The appropriate length of a primer depends on 
the intended use of the primer, but typically ranges from 
5 15 to 30 nucleotides. Short primer molecules generally 
require cooler temperatures to form sufficiently stable 
hybrid complexes with the template . A primer need not 
reflect the exact sequence of the template, but must be 
sufficiently complementary to hybridize with a template. 

10 The term primer site refers to the area of the target DNA 
to which a primer hybridizes. The term primer pair refers 
to a set of primers including a 5' (upstream) -primer that 
hybridizes with the 5' end of the DNA sequence to be 
amplified and a 3' (downstream) primer that hybridizes with 

15 the complement of the 3' end of the sequence to be 
amplified. 

As used herein, linkage describes the tendency of 
genes, alleles, loci or genetic markers to be inherited 
together as a result of their location on the same 

20 chromosome. It can be measured by percent recombination 
-between the two genes, alleles, loci or genetic markers. 

As used herein, polymorphism refers to the occurrence 
of two or more genetically determined alternative sequences 
or alleles in a population. A polymorphic marker or site 

25 is the locus at which divergence occurs. Preferred markers 
have at least two alleles, each occurring at frequency of 
greater than 1%, and more preferably greater than 10% or 
20% of a selected population. A polymorphic locus may be 
as small as one base pair. Polymorphic markers include 

30 restriction fragment length polymorphisms, variable number 
of tandem repeats (VNTR's), hypeirvariable regions, 
minisatellites, dinucleotide repeats, trinucleotide 
repeats, tetranucleotide repeats, simple sequence repeats. 
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and insertion elements such as Alu. The first identified 
allelic form is arbitrarily designated as the reference 
form and other allelic forms are designated as alternative 
or variant alleles. The allelic form occurring most 
5 frequently in a selected population is sometimes referred 
to as the wildtype form. Diploid organisms may be 
homozygous or heterozygous for allelic forms. A diallelic 
or biallelic polymorphism has two forms. A triallelic 
polymorphism has three forms. 

10 A single nucleotide polymorphism occurs at a 

polymorphic site occupied by a single nucleotide, which is 
the site of variation between allelic sequences. -The site 
is usually preceded by and followed by highly conserved 
sequences of the allele (e.g., sequences that vary in less 

15 than l/lOO or l/lOOO members of the populations) . 

A single nucleotide polymorphism usually arises due to 
substitution of one nucleotide for another at the 
polymorphic site. A transition is the replacement of one 
purine by another purine or one pyrimidine by another 

20 pyrimidine. A transversion is the replacement of a purine 
by a pyrimidine or vice versa. Single nucleotide 
polymorphisms can also arise from a deletion of a 
nucleotide or an insertion of a nucleotide relative to a 
reference allele. Typically the polymorphic site is 

25 occupied by a base other than the reference base. For 

example, where the reference allele contains the base "T" 
at the polymorphic site, the altered allele can contain a 
"C", "G" or "A" at the polymorphic site. 

Hybridizations are usually performed under stringent 

3 0 conditions, for example, at a salt concentration of no more 
than 1 M and a temperature of at least 25 °C. For example, 
conditions of 5X SSPE (750 mM NaCl, 50 mM NaPhosphate, 5 mM 
EDTA, pH 7.4) and a temperature of 25-30^C, or equivalent 



I 
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conditions, are suitable for allele-specif ic probe 
hybridizations. Equivalent conditions can be determined by- 
varying one or more of the parameters given as an example, 
as known in the art, while maintaining a similar degree of 
5 identity or similarity between the target nucleotide 
sequence and the primer or probe used. 

The term "isolated" is used herein to indicate that the 
material in question exists in a physical milieu distinct 
from that in which it occurs in nature. For example, an 

10 isolated nucleic acid of the invention may be substantially 
isolated with respect to the complex cellular milieu in 
which it naturally occurs. In some instances ,-*th5- isolated 
material will form part of a composition (for example, a 
cmde extract containing other substances) , buffer system 

15 or reagent mix. In other circumstance, the material may be 
purified to essential homogeneity, for example as 
determined by PAGE or column chromatography such as HPLC, 
Preferably, an isolated nucleic acid comprises at least 
about. 50, 80 or 90 percent (on a molar basis) of all 

20 macromolecular species present. 

I. Novel Polymorphisms of the Invention 

The novel polymorphisms of the invention are listed in 
the Table. The first column of the Table lists the names 
assigned to the fragments in which the polymorphisms occur. 

25 The fragments are all human genomic fragments. The 
sequence of one allelic form of each of the fragments 
(arbitrarily referred to as the prototypical or reference 
form) has been previously published. These sequences are 
listed at http://www-genome.wi.mit.edu/ (all STS's 

30 (sequence tag sites)); http://shgc.stanford.edu (Stanford 
STS's); and http://ww.tigr.org/ (TIGR STS's). The Web 
sites also list primers for amplification of the fragments, 
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and the genomic location of fragments. Some fragments are 
expressed sequence tags, and some are random genomic 
fragments. All information in the websites concerning the 
fragments listed in the Table is incorporated by reference 
5 in its entirety for all purposes. 

The second column lists the position in the fragment in 
which a polymorphic site has been found. Positions are 
numbered consecutively with the first base of the fragment 
sequence as listed in one of the above databases being 

10 assigned the number one. The third column lists the base 
occupying the polymorphic site in the sequence in the data 
base. This base is arbitrarily designated -the* reference or 
prototypical form, but it is not necessarily the most 
frequently occurring form. The fourth column in the Table 

15 lists the alternative base(s) at the polymorphic site. The 
fifth column of the Table lists a 5' (upstream or forward) 
primer that hybridizes with the 5' end of the DNA sequence 
to be amplified. The sixth column of the Table lists a 3' 
(downstream or reverse) primer that hybridizes with the 

20 complement of the 3' end of the sequence to be amplified. 
The seventh column of the Table lists a number of bases of 
sequence on either side of the polymorphic site in each 
fragment. The indicated sequences can be either DNA or 
RNA. In the latter, the T's shown in the Table are 

25 replaced by U's, The base occupying the polymorphic site 
is indicated in EUPAC-IUB ambiguity code. 

II. Analysis of Polymorphisms 
A. Preparation of Samples 

Polymorphisms are detected in a target nucleic acid 
30 from an individual being analyzed. For assay of genomic 
DNA, virtually any biological sample (other than pure red 
blood cells) is suitable. For example, convenient tissue 
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samples include whole blood, semen, saliva, tears, urine, 
fecal material, sweat, buccal, skin and hair. For assay of 
cDNA or mRNA, the tissue sample must be obtained from an 
organ in which the target nucleic acid is expressed. For 
5 example, if the target nucleic acid is a cytochrome P450, 
the liver is a suitable source. 

Many of the methods described below require 
amplification of DNA from target samples. This can be 
accomplished by e.g., PGR. See generally PCR Technology: 

10 Principles and Applications for DNA Amplification (ed. H.A. 
Erlich, Freeman Press, NY, NY, 1992); PCR Protocols: A 
Guide to Methods and Applications (eds. Innis,— et-^al . , 
Academic Press, San Diego, CA, 1990); Mattila et al . , 
Nucleic Acids Res. 19, 4967 (1991); Eckert et al., PCR 

15 Methods and Applications 1, 17 (1991); PCR (eds. McPherson 
et al., IRL Press, Oxford); and U.S. Patent 4,683,202. 

Other suitable amplification methods include the ligase 
chain reaction (LCR) (see Wu and Wallace, Genomics 4, 560 
(1989), Landegren et al., Science 241, 1077 (1988), 

20 transcription amplification (Kwoh et al . , Proc. Natl. Acad. 
Sci. USA 86, 1173 (1989)), and self -sustained sequence 
replication (Guatelli et al . , Proc, Nat. Acad, Sci, USA, 
87, 1874 (1990)) and nucleic acid based sequence 
amplification (NASBA) . The latter two amplification 

25 methods involve isothermal reactions based on isothermal 
transcription, which produce both single stranded RNA 
(ssRNA) and double stranded DNA (dsDNA) as the 
amplification products in a ratio of about 3 0 or 100 to 1, 
respectively. 

30 B. Detection of Polymorphisms in Target DNA 

There are two distinct types of analysis of target DNA 
for detecting polymorphisms. The first type of analysis. 



wo 98/20165 



PCT/US97/20313 



-11- 

sometimes referred to as de novo characterization, is 
carried out to identify polymorphic sites not previously 
characterized (i.e., to identify new polymorphisms). This 
analysis compares target sequences in different individuals 
5 to identify points of variation, i.e., polymorphic sites. 
By analyzing groups of individuals representing the 
greatest ethnic diversity among humans and greatest bre^d 
and species variety in plants and animals, patterns 
characteristic of the most common alleles/haplotypes of the 

10 locus can be identified, and the frequencies of such 

alleles/haplotypes in the population can be determined. 
Additional allelic frequencies can be determined for 
subpopulations characterized by criteria such as geography, 
race, or gender. The de novo identification of 

15 polymorphisms of the invention is described in the Examples 
section. The second type of analysis determines which 
form(s) of a characterized (known) polymorphism are present 
in individuals under test. There are a variety of suitable 
procedures, which are discussed in turn. 

20 1. Allele-Specif ic Probes 

The design and use of allele-specif ic probes for 
analyzing polymorphisms is described by e.g., Saiki et al., 
Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, 
WO 89/11548. Allele-specif ic probes can be designed that 

25 hybridize to a segment of target DNA from one individual 
but do not hybridize to the corresponding segment from 
another individual due to the presence of different 
polymorphic forms in the respective segments from the two 
individuals. Hybridization conditions should be 

30 sufficiently stringent that there is a significant 

difference in hybridization intensity between alleles, and 
preferably an essentially binary response, whereby a probe 



wo 98/20165 PCT/US97/20313 

-12- 

hybridizes to only one of the alleles. Some probes are 
designed to hybridize to a segment of target DNA such that 
the polymorphic site aligns with a central position (e.g., 
in a 15-mer at the 7 position; in a 16-mer, at either the 8 
5 or 9 position) of the probe. This design of probe achieves 
good discrimination in hybridization between different 
allelic forms. 

Allele-spccif ic probes are often used in pairs, one 
member of a pair showing a perfect match to a reference 
10 form of a target sequence and the other member showing a 
perfect match to a variant form. Several pairs of probes 
can then be immobilized on the same support for 
simultaneous analysis of multiple polymorphisms within the 
same target sequence. 

15 2. Tiling Arrays 

The polymorphisms can also be identified by 
hybridization to nucleic acid arrays, some examples of 
which are described in WO 95/11995. One form of such 
arrays is described in the Examples section in connection 

20 with de novo identification of polymorphisms. The same 
array or a different array can be used for analysis of 
characterized polymorphisms. WO 95/11995 also describes 
subarrays that are optimized for detection of a variant 
form of a precharacterized polymorphism. Such a subarray 

25 contains probes designed to be complementary to a second 
reference sequence, which is an allelic variant of the 
first reference sequence. The second group of probes is 
designed by the same principles as described in the 
Examples, except that the probes exhibit complementarity to 

30 the second reference sequence. The inclusion of a second 
group (or further groups) can be particularly useful for 
analyzing short subsequences of the primary reference 
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sequence in which multiple mutations are expected to occur 
within a short distance commensurate with the length of the 
probes (e.g., two or more mutations within 9 to 21 bases) . 

3. Allele-Specif ic Primers 

5 An allele-specif ic primer hybridizes to a site on 

target DNA overlapping a polymorphism and only primes 
amplification of an allelic form to which the primer 
exhibits perfect complementarity. See Gibbs, Nucleic Acid 
Res. 17, 2427-2448 (1989). This primer is used in 

10 conjunction with a second primer which hybridizes at a 

distal site. Amplification proceeds from t-he -two-primers, 
resulting in a detectable product which indicates the 
particular allelic form is present- A control is usually 
performed with a second pair of primers, one of which shows 

15 a single base mismatch at the polymorphic site and the 

other of which exhibits perfect complementarity to a distal 
site. The single-base mismatch prevents amplification and 
no detectable product is formed. The method works best 
when the mismatch is included in the 3 '-most position of 

20 the oligonucleotide aligned with the polymorphism because 
this position is most destabilizing to elongation from the 
primer (see, e.g., WO 93/22456). 

4 . Direct-Sequencing 

The direct analysis of the sequence of polymorphisms of 
25 the present invention can be accomplished using either. the 
dideoxy chain termination method or the Maxam Gilbert 
method (see Sambrook at al., Molecular Cloning, A 
Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind 
et al., Recoinbinant DNA Laboratory Manual , (Acad. Press, 
30 1988) ) . 
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5. Denaturing Gradient Gel Electrophoresis 
Amplification products generated using the polymerase 
chain reaction can be analyzed by the use of denaturing 
gradient gel electrophoresis. Different alleles can be 
5 identified based on the different sequence -dependent 

melting properties and electrophoretic migration of DNA in 
solution, Erlich, ed., PCR Technology, Principles and 
Applications for DNA Amplification, (W.H, Freeman and Co, 
New York, 1992), Chapter 7. 

10 6. Single-Strand Conformation Polymorphism Analysis 

Alleles of target sequences can be differentiated using 
single- strand conformation polymorphism analysis, which 
identifies base differences by alteration in 
electrophoretic migration of single stranded PCR products, 

15 as described in Orita et ai., Proc. Nat, Acad. Sci . 86, 

2766-2770 (1989). Amplified PCR products can be generated 
as described above, and heated or otherwise denatured, to 
form single stranded amplification products. Single- 
stranded nucleic acids may refold or form secondary 

20 structures which are partially dependent on the base 
sequence. The different electrophoretic mobilities of 
single-stranded amplification products can be related to 
base-sequence differences between alleles of target 
sequences. 



25 III. Methods of Use 

After determining polymorphic form(s) present in an 
individual at one or more polymorphic sites, this 
information can be used in a number of methods. 
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A. Forensics 

Determination of which polymorphic forms occupy a set 
of polymorphic sites in an individual identifies a set of 
polymorphic forms that distinguishes the individual. See 
5 generally National Research Council, The Evaluation of 
Forensic DNA Evidence (Eds. Pollard et al . , National 
Academy Press, DC, 1996) . The more sites that are 
analyzed, the lower the probability that the set of 
polymorphic forms in one individual is the same as that in 
10 an unrelated individual. Preferably, if multiple sites are 
analyzed, the sites are unlinked. Thus, polymorphisms of 
the invention are often used in conjunction- wi"th 
polymorphisms in distal genes. Preferred polymorphisms for 
use in forensics are biallelic because the population 
15 frequencies of two polymorphic forms can usually be 

determined with greater accuracy than those of multiple 
polymorphic forms at multi-allelic loci. 

The capacity to identify a distinguishing or unique set 
of forensic markers in an individual is useful for forensic 
20 analysis. For example, one can determine whether a blood 
sample from a suspect matches a blood or other tissue 
sample from a crime scene by determining whether the set of 
polymorphic forms occupying selected polymorphic sites is 
the same in the suspect and the sample. If the set of 
25 polymorphic markers does not match between a suspect and a 
sample, it can be concluded (barring experimental error) 
that the suspect was not the source of the sample. If the 
set of markers does match, one can conclude that the DNA 
from the suspect is consistent with that found at the crime 
30 scene. If frequencies of the polymorphic forms at the loci 
tested have been determined (e.g., by analysis of a 
suitable population of individuals) , one can perform a 
statistical analysis to determine the probability that a 
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match of suspect and crime scene sample would occur by 
chance . 

p(ID) is the probability that two random individuals 

have the same polymorphic or allelic form at a given 

5 polymorphic site. In biallelic loci, four genotypes are 

possible: AA, AB, BA, and BB, If alleles A and B occur in 

a haploid genome of the organism with frequencies x and y, 

the probability of each genotype in a diploid organism is 

(see WO 95/12607) : 

10 Homozygote: p (AA) = 

Homozygote: p(BB)- y^ = (l-x)^ 

Single Heterozygote : p(AB)= p(BA)= xy = x(l-x) 
Both Heterozygotes: p{AB+BA)= 2xy = 2x(l-x)- 

The probability of identity at one locus (i.e, the 
15 probability that two individuals, picked at random from a 
population will have identical polymorphic forms at a given 
locus) is given by the equation: 
p(ID) = (x^)^^ -f (2xy)2 + (y^)^ 

These calculations can be extended for any number of 
20 polymorphic forms at a given locus. For example, the 

probability of identity p(ID) for a 3 -allele system where 
the alleles have the frequencies in the population of x, y 
and z, respectively, is equal to the sum of the squares of 
the genotype frequencies: 
25 p(ID) = x^ + (2xy)2 + (2yz)2 + (2xz)2 + z^ + y^ 

In a locus of n alleles, the appropriate binomial 
expansion is used to calculate p(ID) and p(exc). 

The cumulative probability of identity (cum p(ID)) for 
each of multiple unlinked loci is determined by multiplying 
30 the probabilities provided by each locus. 

cum p(ID) = p(IDl)p(ID2)p(ID3) p(IDn) 
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The cumulative probability of non- identity for n loci 
(i.e. the probability that two random individuals will be 
different at 1 or more loci) is given by the equation: 

cum p(nonlD) = 1-cum p(ID) . 
5 If several polymorphic loci are tested, the cumulative 

probability of non-identity for random individuals becomes 
very high (e.g., one billion to one) . Such probabilities 
can be taken into account together with other evidence in 
determining the guilt or innocence of the suspect. 

10 B. Paternity Testing 

The object of paternity testing is usually* to"" determine 
whether a male is the father of a child. In most cases, 
the mother of the child is known and thus,, the mother's 
contribution to the child's genotype can be traced. 

15 Paternity testing investigates whether the part of the 
child's genotype not attributable to the mother is 
consistent with that of the putative father. Paternity 
testing can be perfoirmed by analyzing sets of polymorphisms 
in the putative father and the child. 

20 If the set of polymorphisms in the child attributable 

to the father does not match the set of polymorphisms of 
the putative father, it can be concluded, barring 
experimental error, that the putative father is not the 
real father. If the set of polymorphisms in the child 

25 attributable to the father does match the set of 

polymorphisms of the putative father, a statistical 
calculation can be performed to determine the probability 
of coincidental match. 

The probability of parentage exclusion (representing 

30 the probability that a random male will have a polymorphic 
form at a given polymorphic site that makes him 
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incompatible as the father) is given by the equation (see 
WO 95/12607) : 

p(exc) = xy{l-xy) 
where x and y are the population frequencies of alleles A 
5 and B of a biallelic polymorphic site. 

(At a triallelic site p(exc) = xy(l-xy) + yz(l- yz) + 
xz(l-xz)+ 3xyz (1-xyz) ) ) , where x, y and z and the 
respective population frequencies of alleles A, B and C) . 
The probability of non-exclusion is 
10 p{non-exc) = l-p{exc) 

The cumulative probability of non-exclusion 
(representing the value obtained when n loci are used) is 
thus : 

cum p(non-exc) = p{non-excl)p (non-exc2)p(non-exc3) .... 

15 p(non-excn) 

The cumulative probability of exclusion for n loci 
(representing the probability that a random male will be 
excluded) 

cum p{exc) = 1 - cum p(non-exc) . 

20 If several polymorphic loci are included in the 

analysis, the cumulative probability of exclusion of a 
random male is very high. This probability can be taken 
into account in assessing the liability of a putative 
father whose polymorphic marker set matches the child's 

25 polymorphic marker set attributable to his/her father, 

C. Correlation of Polymorphisms with Phenotypic Traits 
The polymorphisms of the invention may contribute to 
the phenotype of an organism in different ways. Some 
polymorphisms occur within a protein coding sequence and 
30 contribute to phenotype by affecting protein structure. 
The effect may be neutral, beneficial or detrimental, or 
both beneficial and detrimental, depending on the 



I 
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circumstances. For example, a heterozygous sickle cell 
mutation confers resistance to malaria, but a homozygous 
sickle cell mutation is usually lethal. Other 
polymorphisms occur in noncoding regions but may exert 
5 phenotypic effects indirectly via influence on replication, 
transcription, and translation. A single polymorphism may 
affect more than one phenotypic trait. Likewise, a single 
phenotypic trait may be affected by pol^^morphisms in 
different genes. Further, some polymorphisms predispose an 

10 individual to a distinct mutation that is causally related 
to a certain phenotype. 

Phenotypic traits include diseases that have knovm but 
hitherto unmapped genetic components (e.g., 
agammaglobulimenia, diabetes insipidus, Lesch-Nyhan 

15 syndrome, muscular dystrophy, Wiskott-Aldrich syndrome, 

Fabry's disease, familial hypercholesterolemia, polycystic 
kidney disease, hereditary spherocytosis, von Willebrand' s 
disease, tuberous sclerosis, hereditary hemorrhagic 
telangiectasia, familial colonic polyposis, Ehlers-Danlos 

20 syndrome, osteogenesis imperfecta, and acute intermittent 
porphyria) . Phenotypic traits also include symptoms of, or 
susceptibility to, multifactorial diseases of which a 
component is or may be genetic, such as autoimmune 
diseases, inflammation, cancer, diseases of the nervous 

25 system, and infection by pathogenic microorganisms. Some 
examples of autoimmune diseases include rheumatoid 
arthritis, multiple sclerosis, diabetes (insulin-dependent 
and non-independent), systemic lupus erythematosus and 
Graves disease. Some examples of cancers include cancers 

30 of the bladder, brain, breast, colon, esophagus, kidney, 
leukemia, liver, lung, oral cavity, ovary, pancreas, 
prostate, skin, stomach and uterus. Phenotypic traits also 
include characteristics such as longevity, appearance 
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(e.g., baldness, obesity), strength, speed, endurance, 
fertility, and susceptibility or receptivity to particular 
drugs or therapeutic treatments. 

Correlation is performed for a population of 
5 individuals who have been tested for the presence or 
absence of a phenotypic trait of interest and for 
polymorphic markers sets. To perform such analysis, the 
presence or absence of a set of polymorphisms (i.e. a 
polymorphic set) is determined for a set of the 

10 individuals, some of whom exhibit a particular trait, and 
some of which exhibit lack of the trait. The alleles of 
each polymorphism of the set are then reviewed-to-determine 
whether the presence or absence of a particular allele is 
associated with the trait of interest. Correlation can be 

15 performed by standard statistical methods such as a «- 
squared test and statistically significant correlations 
between polymorphic form(s) and phenotypic characteristics 
are noted. For example, it might be found that the 
presence of allele Al at polymorphism A correlates with 

20 heart disease. As a further example, it might be found 
that the combined presence of allele Al at polymorphism A 
and allele Bl at polymorphism B correlates with increased 
milk production of a farm animal. 

Such correlations can be exploited in several ways. In 

25 the case of a strong correlation between a set of one or 

more polymorphic forms and a disease for which treatment is 
available, detection of the polymorphic form set in a human 
or animal patient may justify immediate administration of 
treatment, or at least the institution of regular 

30 monitoring of the patient. Detection of a polymorphic form 
correlated with serious disease in a couple contemplating a 
family may also be valuable to the couple in their 
reproductive decisions. For example, the female partner 
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might elect to undergo in vitro fertilization to avoid the 
possibility of transmitting such a polymorphism from her 
husband to her offspring. In the case of a weaker, but 
still statistically significant correlation between a 
5 polymorphic set and human disease, immediate therapeutic 
intervention or monitoring may not be justified. 
Nevertheless, the patient can be motivated to begin simple 
life-style changes (e.g., diet, exercise) that can be 
accomplished at little cost to the patient but confer 

10 potential benefits in reducing the risk of conditions to 
which the patient may have increased susceptibility by 
virtue of variant alleles. Identification *of -a polymorphic 
set in a patient correlated with enhanced receptiveness to 
one of several treatment regimes for a disease indicates 

15 that this treatment regime should be followed. 

For animals and plants, correlations between 
characteristics and phenotype are useful for breeding for 
desired characteristics. For example, Beitz et al . , US 
5,292,639 discuss use of bovine mitochondrial polymorphisms 

20 in a breeding program to improve milk production in cows. 
To evaluate the effect of mtDNA D-loop sequence 
polymorphism on milk production, each cow was assigned a 
value of 1 if variant or 0 if wildtype with respect to a 
prototypical mitochondrial DNA sequence at each of 17 

25 locations considered. Each production trait was analyzed 
individually with the following animal model: 

Yijkpn= M + YSi + Pj + + jSi + ... + PE„ + a„ +ep 
where Yijj^p is the milk, fat, fat percentage, SNF, SNF 
percentage, energy concentration, or lactation energy 

30 record; is an overall mean; YSi is the effect common to 
all cows calving in year-season; is the effect common to 
cows in either the high or average selection line; jSi to 
are the binomial regressions of production record on mtDNA 
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D-loop sequence polymorphisms; PE^ is permanent 
environmental effect common to all records of cow n; a^ is 
effect of animal n and is composed of the additive genetic 
contribution of sire and dam breeding values and a 
5 Mendelian sampling effect; and ep is a random residual. It 
was found that eleven of seventeen polymorphisms tested 
influenced at least one production trait. Bovines having 
the best polymorphic forms for milk production at these 
eleven loci are used as parents for breeding the next 
10 generation of the herd. 

D. Genetic Mapping of Phenotypic Traits 
The previous section concerns identifying correlations 
between phenotypic traits and polymorphisms that directly 
or indirectly contribute to those traits. The present 

15 section describes identification of a physical linkage 

between a genetic locus associated with a trait of interest 
and polymorphic markers that are not associated with the 
trait, but are in physical proximity with the genetic locus 
responsible for the trait and co-segregate with it. Such 

20 analysis is useful for mapping a genetic locus associated 
with a phenotypic trait to a chromosomal position, and 
thereby cloning gene{s) responsible for the trait. See 
Lander et al., Proc. Natl. Acad. Sci. (USA) 83, 7353-7357 

(1986) ; Lander et al., Proc. Natl. Acad. Sci. (USA) 84, 
.25 2363-2367 (1987); Donis-Keller et al . , Cell 51, 319-337 

(1987) ; Lander et al . , Genetics 121, 185-199 (1989)). 
Genes localized by linkage can be cloned by a process known 
as directional cloning. See Wainwright, Med. J. Australia 
159, 170-174 (1993); Collins, Nature Genetics 1, 3-6 

30 (1992) . 

Linkage studies are typically performed on members of a 
family. Available members of the family are characterized 
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f or the presence or absence of a phenotypic trait and for a 
set of polymorphic markers. The distribution of 
polymorphic markers in an informative meiosis is then 
analyzed to determine which polymorphic markers co- 
5 segregate with a phenotypic trait. See, e.g., Kerem et 
al.. Science 245, 1073-1080 (1989); Monaco et al . , Nature 
316, 842 (1985); Yamoka et al . , Neurology 4.0 , 222-226 
(1990); Rossiter et al . , FASEB Journal S, 21-27 (1991). 
Linkage is analyzed by calculation of LOD (log of the 

10 odds) values. A lod value is the relative likelihood of 
obtaining observed segregation data for a marker and a 
genetic locus when the two are located at a recombination 
fraction d, versus the situation in which the two are not 
linked, and thus segregating independently (Thompson & 

15 Thompson, Genetics in Medicine (5th ed, W.B. Saunders 

Company, Philadelphia, 1991) ; Strachan, "Mapping the human 
genome" in The Human Genome (BIOS Scientific Publishers 
Ltd, Oxford) , Chapter 4) . A series of likelihood ratios 
are calculated at various recombination fractions {6), 

20 ranging from 0 = 0.0 (coincident loci) to ^ = 0.50 

(unlinked). Thus, the likelihood at a given value of 6 is: 
probability of data if loci linked at d to probability of 
data if loci unlinked. The computed likelihoods are 
usually expressed .as the log^o of this ratio (i.e., a lod 

25 score). For example, a lod score of 3 indicates 1000:1 
odds against an apparent observed linkage being a 
coincidence. The use of logarithms- allows data collected 
from different families to be combined by simple addition. 
Computer programs are available for the calculation of lod 

30 scores for differing values of 6 (e.g., LIPED, MLINK 
(Lathrop, Proc, Nat. Acad. Sci. (USA) 81, 3443-3446 
(1984)). For any particular lod score, a recombination 
fraction may be determined from mathematical tables. See 
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Smith et al., Mathematical tables for research workers in 
human genetics (Churchill, London, 1961); Smith, Arm. Hum. 
Genet. 32, 127-150 (1968) , The value of 6 at which the led 
score is the highest is considered to be the best estimate 
5 of the recombination fraction. 

Positive led score values suggest that the two loci are 
linked, whereas negative values suggest that linkage is 
less likely (at that value of d) than the possibility that 
the two loci are unlinked. By convention, a combined lod 

10 score of +3 or greater (equivalent to greater than 1000:1 
odds in favor of linkage) is considered definitive evidence 
that two loci are linked. Similarly, by convention, a 
negative lod score of -2 or less is taken as definitive 
evidence against linkage of the two loci being compared. 

15 Negative linkage data are useful in excluding a chromosome 
or a segment thereof from consideration- The search 
focuses on the remaining non-excluded chromosomal 
locations . 

IV. Modified Polypeptides and Gene Sequences 

2 0 The invention further provides variant forms of nucleic 

acids and corresponding proteins. The nucleic acids 
comprise one of the sequences described in the Table, 
column 8, in which the polymorphic position is occupied by 
one of the alternative bases for that position. Some 
25 nucleic acids encode full-length variant forms of proteins. 
Similarly, variant proteins have the prototypical amino 
acid sequences encoded by nucleic acid sequences shown in 
the Table, column 8, (read so as to be in- frame with the 
full-length coding sequence of which it is a component) 

3 0 except at an amino acid encoded by a codon including one of 

the polymorphic positions shown in the Table. That 
position is occupied by the amino acid coded by the 
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corresponding codon in any of the alternative forms shown 
in the Table. 

Variant genes can be expressed in an expression vector 
in which a variant gene is operably linked to a native or 
5 other promoter. Usually, the promoter is a eukaryotic 
promoter for expression in a mammalian cell. The 
transcription regulation sequences typically include a 
heterologous promoter and optionally an enhancer which is 
recognized by the host. The selection of an appropriate 

10 promoter, for example trp, lac, phage promoters, glycolytic 
enzyme promoters and tRNA promoters, depends on the host 
selected. Commercially available expression vectors can be 
used. Vectors can include host-recognized replication 
systems, amplifiable genes, selectable markers, host 

15 sequences useful for insertion . into the host genome, and 
the like. 

The means of introducing the expression construct into 
a host cell varies depending upon the particular 
construction and the target host. Suitable means include 

20 fusion, conjugation, transf ection, transduction, 

electroporation or injection, as described in Sambrook, 
supra. A wide variety of host cells can be employed for 
expression of the variant gene, both prokaryotic and 
eukaryotic. Suitable host cells include bacteria such as 

25 E. coii, yeast, filamentous fungi, insect cells, mammalian 
cells, typically immortalized, e.g., mouse, CHO, human and 
monkey cell lines and derivatives thereof. Preferred host 
cells are able to process the variant gene product to 
produce an appropriate mature polypeptide. Processing 

3 0 includes glycosylation, ubiquitination, disulfide bond 

formation, general post-translational modification, and the 
like. 
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The protein may be isolated by conventional means of 
protein biochemistry and purification to obtain a 
substantially pure product, i.e., 80, 95 or 99% free of 
cell component contaminants, as described in Jacoby, 
5 Methods in Enzymology Volume 104, Academic Press, New York 
(1984) ; Scopes, Protein Purification, Principles and 
Practice, 2nd Edition, Springer- Verlag, New York (1987) ; 
and DeuLscher (ed) , Guide to Protein Purification, Methods 
in Enzymology, Vol. 182 (1990). If the protein is 

10 secreted, it can be isolated from the supernatant in which 
the host cell is grown. If not secreted, the protein can 
be isolated from a lysate of the host cells.- 

The invention further provides transgenic nonhuman 
animals . capable of expressing an exogenous variant gene 

15 and/or having one or both alleles of an endogenous variant 
gene inactivated. Expression of an exogenous variant gene 
is usually achieved by operably linking the gene to a 
promoter and optionally an enhancer, and microinjecting the 
construct into a zygote. See Hogan et al., "Manipulating 

20 the Mouse Embryo, A Laboratory Manual," Cold Spring Harbor 
Laboratory. Inactivation of endogenous variant genes can 
be achieved by forming a transgene in which a cloned 
variant gene is inactivated by insertion of a positive 
selection marker. See Capecchi, Science 244, 1288-1292 

25 (1989) . The transgene is then introduced into an embryonic 
stem cell, where it undergoes homologous recombination with 
an endogenous variant gene. Mice and other rodents are 
preferred animals. Such animals provide useful drug 
screening systems, 

30 In addition to substantially full-length polypeptides 

expressed by variant genes, the present invention includes 
biologically active fragments of the polypeptides, or 
analogs thereof, including organic molecules which simulate 
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the interactions of the peptides. Biologically active 
fragments include any portion of the full-length 
polypeptide which confers a biological function on the 
variant gene product, including ligand binding, and 
5 antibody binding. Ligand binding includes binding by 

nucleic acids, proteins or polypeptides, small biologically 
active molecules, or large cellular structures. 

Polyclonal and/or monoclonal antibodies that 
specifically bind to variant gene products but not to 

10 corresponding prototypical gene products are also provided. 
Antibodies can be made by injecting mice or other animals 
with the variant gene product or synthetic pepfcide- 
fragments thereof. Monoclonal antibodies are screened as 
are described, for example, in Harlow & Lane, AntibodieSr A 

15 Laboratory Manual, Cold Spring Harbor Press, New York 
(1988) ; Coding, Monoclonal antibodies, Principles and 
Practice (2d ed.) Academic Press, New York (1986). 
Monoclonal antibodies are tested for specific 
immunoreactivity with a variant gene product and lack of 

20 immunoreactivity to the corresponding prototypical gene 
product. These antibodies are useful in diagnostic assays 
for detection of the variant form, or as an active 
ingredient in a pharmaceutical composition. 

V. Kits 

25 The invention further provides kits comprising at least 

one allele-specif ic oligonucleotide as described above. 
Often, the kits contain one or more pairs of allele- 
specific oligonucleotides hybridizing to different forms of 
a polymorphism. In some kits, the allele-specif ic 

30 oligonucleotides are provided immobilized to a substrate. 
For example, the same substrate can comprise allele - 
specific oligonucleotide probes for detecting at least 10, 
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100 or all of the polymorphisms shown in the Table. 
Optional additional components of the kit include, for 
example, restriction enzymes, reverse -transcriptase or 
polymerase, the substrate nucleoside triphosphates, means 
5 used to label (for example, an avidin-enzyme conjugate and 
enzyme substrate and chromogen if the label is biotin) , and 
the appropriate buffers for reverse transcription, PGR, or 
hybridization reactions. Usually, the kit also contains 
instructions for carrying out the methods. 
10 The following Examples are offered for the purpose of 

illustrating the present invention and are not to be 
construed to limit the scope of this invent-ion-.- "Phe 
teachings of all references cited herein are hereby 
incorporated herein by reference. 

15 EXAMPLES 

The polymorphisms shown in the Table were identified by 
resequencing of target sequences from three to ten 
unrelated individuals of diverse ethnic and geographic 
backgrounds by hybridization to probes immobilized to 

20 microfabricated arrays or conventional sequencing. The 
strategy and principles for design and use of such arrays 
are generally described in WO 95/11995. The strategy 
provides arrays of probes for analysis of target sequences 
showing a high degree of sequence identity to the reference 

25 sequences of the fragments shown in the Table, column 1. 
The reference sequences were sequence-tagged sites (STSs) 
developed in the course of the Human Genome Project (see, 
e.g.. Science 210, 1945-1954 (1995); Nature 380, 152-154 
(1996)). Most STS's ranged from 100 bp to 300 bp in size. 

30 A typical probe array used in this analysis has two 

groups of four sets of probes that respectively tile both 
strands of a reference sequence. A first probe set 
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comprises a plurality of probes exhibiting perfect 
complementarily with one of the reference sequences. Each 
probe in the first probe set has an interrogation position 
that corresponds to a nucleotide in the reference sequence. 
5 That is, the interrogation position is aligned with the 
corresponding nucleotide in the reference sequence, when 
the probe and reference sequence are aligned to maximize 
complementarily between the two. For each probe in the 
first set, there are three corresponding probes from three 
10 additional probe sets. Thus, there are four probes 

corresponding to each nucleotide in the reference sequence. 
The probes from the three additional probe -set-e aa?e 
identical to the corresponding probe from the first probe 
set except at the interrogation position, which occurs in 
15 the same position in each of the four corresponding probes 
from the four probe sets, and is occupied by a different 
nucleotide in the four probe sets. In the present 
analysis, probes were 25 nucleotides long. Arrays tiled 
for multiple different references sequences were included 
20 on the same substrate. 

Multiple target sequences from an individual were 
amplified from human genomic DNA using primers for the 
fragments indicated in the listed Web sites. The amplified 
target sequences were f luorescently labelled during or 
25 after PGR. The labelled target sequences were hybridized 
with a substrate bearing immobilized arrays of probes. The 
amount of lable bound to probes was measured. Analysis of 
the pattern of label revealed the nature and position of 
differences between the target and reference sequence. For 
3 0 example, comparison of the intensities of four 
corresponding probes reveals the identity of a 
corresponding nucleotide in the target sequences aligned 
with the interrogation position of the probes. The 
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corresponding nucleotide is the complement of the 
nucleotide occupying the interrogation position of the 
probe showing the highest intensity (see WO 95/11995) , The 
existence of a polymorphism is also manifested by 
5 differences in normalized hybridization intensities of 

probes flanking the polymorphism when the probes hybridized 
to corresponding targets from different individuals. For 
example, relative loss of hybridization intensity in a 
"footprint" of probes flanking a polymorphism signals a 

10 difference between the target and reference (i.e., a 
polymorphism) (see EP 717,113). Additionally, 
hybridization intensities for corresponding targebs from 
different individuals can be classified into groups or 
clusters suggested by the data, not defined a priori, such 

15 that isolates in a give cluster tend to be similar and 
isolates in different clusters tend to be dissimilar. 
Hybridizations to samples from different individuals were 
performed separately. The Table summarizes the data 
obtained for target sequences in comparison with a 

20 reference sequence for the individuals tested. 

From the foregoing, it is apparent that the invention 
includes a number of general uses that can be expressed 
concisely as follows. The invention provides for the use 
of any of the nucleic acid segments described above in the 

25 diagnosis or monitoring of diseases, such as cancer, 
inflammation, heart disease, diseases of the CNS, and 
susceptibility to infection by microorganisms. The 
invention further provides for the use of any of the 
nucleic acid segments in the manufacture of a medicament 

30 for the treatment or prophylaxis of such diseases. The 
invention further provides for the use of any of the DNA 
segments as a pharmaceutical. 
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All publications and patent applications cited above 
are incorporated by reference in their entirety for all 
puirposes to the same extent as if each individual 
publication or patent application were specifically and 
5 individually indicated to be so incorporated by reference. 
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EQUIVALENTS 

While this invention has been particularly shown and 
described with references to preferred embodiments thereof, 
it will be understood by those skilled in the art that 
5 various changes in form and details may be made therein 
without departing from the spirit and scope of the 
invention as defined by the appended claims. Those skilled 
in the art will recognize or be able to ascertain using no 
more than routine experimentation, many equivalents to the 
10 specific embodiments of the invention described 

specifically herein. Such equivalents are intended to be 
encompassed in the scope of the claims. 
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CLAIMS 

WE CLAIM: 

1. A. nucleic acid segment shown in column 7 of the Table, 
or a portion thereof which includes a polymorphic site 

5 or the complement of the segment or portion thereof. 

2. The nucleic acid segment of claim 1 that is DNA. 

3. The nucleic acid segment of claim 1 that is RNA. 

4. The segment of claim 1 that is less 'than 100 bases. 

5. The segment of claim 1 that is less than 50 bases. 
10 6. The segment of claim 1 that is less than 20 bases. 

7. The segment of claim 1, wherein the polymorphic site i 
biallelic. 

8. The segment of claim 1, wherein the polymorphic form 
occupying the polymorphic site is the reference base 

15 for the fragment listed in the Table, column 3. 

9. The segment of claim 1, wherein the polymorphic form 
occupying the polymorphic- site is an alternative form 
for the fragment listed in the Table, column 4. 

10. An allele-specif ic oligonucleotide that hybridizes to 
20 segment of a fragment shown in the Table, column 7 or 

its complement. 

11. The allele-specif ic oligonucleotide of claim 10 that i 
a probe . 
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12. The allele-specif ic oligonucleotide of claim 10, 
wherein a central position of the probe aligns with the 
polymorphic site of the fragment . 

13. The allele-specif ic oligonucleotide of claim 10 that is 
5 a primer. 

14. The allele-specif ic oligonucleotide of claim 13, 
wherein the 3' end of the primer aligns with the 
polymorphic site of the fragment . 

15. The allele-specif ic oligonucleotide of Claim 10, which 
10 is selected from the group consisting of the nucleotide 

sequences of the Table, column 5. 

16. The allele-specif ic oligonucleotide of Claim 10, which 
is selected from the group consisting of the nucleotide 
sequences of the Table, column 6. 

15 17. An isolated nucleic acid comprising a sequence of the 
Table, column 7 or the complement thereof, wherein the 
polymorphic site within the sequence or complement is 
occupied by a base other than the reference base shown 
in the Table, column 3. 



20 18. A method of analyzing a nucleic acid, comprising 

obtaining the nucleic acid from an individual; and 
determining a base occupying any one of the polymorphic 
sites shown in the Table. 



19. 

25 



The method of claim 18, wherein the determining 
comprises determining a set of bases occupying a set of 
the polymorphic sites shown in the Table. 
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20. The method of claim 18, wherein the nucleic acid is 
obtained from a plurality of individuals, and a base 
occupying one of the polymorphic positions is 
determined in each of the individuals, and the method 
5 further comprising testing each individual for the 

presence of a disease phenotype, and correlating the 
presence of the disease phenotype with the base . 



